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ABSTRACT 


A  discussion  Is  made  of  nonparametrlc  versus  paraaeCrlc  nethods  for  the  esClmaCloa  aC 
probability  densities.  A  new  algorlthin  for  nonparasetrie  density  estimation  is  given  ani 
Its  performance  compared  with  state-of-the-art  kernel  estimation  algorithms. 

Key  words:  computational  feasibility,  maximum  likelihood,  Pearson  family,  el  estlaabes, 
penalised  maximum  likelihood. 

1.  INTRODUCTION 


TWO  major  causes  for  poor  (especially  nonrobust)  optimisation  theoretic  techniques  la 
statistics  are 

(1)  an  Inappropriate  choice  of  a  parameter  (function)  space 

and 

(2)  an  Inappropriate  choice  of  a  criterion  function  (functional). 

"Appropriateness"  is  determined  by  a  balance  between  computational  feasibility  and  ap¬ 
proximation  to  truth.  Ids  to  be  expected  that  the  advent  of  the  high  speed  digital  computer 
should  drastically  raise  our  pain  threshold  of  computational  feasibility.  Consequently  It  Is 
somewhat  surprising  that  most  standard  statistical  procedures  have  remained  unchanged  sfaice 
the  1930*8.  Many  of  these  Involve  the  estimation  of  probability  densities. 

2.  DISCUSSICH 

In  1922  Fisher  (1]  presented  the  concept  of  parametric  naximuo  likelihood  estlsatlm. 

We  recall  that  his  development  requires  the  functional  form  of  the  unknown  densj^ty  f(x]6) 
be  known.  Given  a  random  sample  {x.,X2,...,x  }  from  f,  we  seek  that  value  V|^(s)  con¬ 
tained  In  appropriate  parameter  space  :^CR  wh?ch  maxlmiaes 


which  maxlmiaes 


(Bi 

log  f^fclo)  f{jtj|e)  . 


Then  under  very  general  conditions. 
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The  latter  result  is  particularly  appealing,  since  it  states  that  Che  parametric  saxiaua 
likelihood  estimator  asymptotically  achieves  the  Cauchy-Schwara  (Cramer-Rao)  Imer  bound 

for  E((e-e)^),  where  Sg©,  the  class  of  unbiased  estimates  for  9  . 
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The  optimality  properties  of  parametric  maximum  likelihood  algorithms  ere  likely  to  be 
of  little  utility  If  (as  Is  generally  the  case)  we  do  not  have  a  good  idea  as  to  the 
functional  form  of  the  unknown  density.  For  example.  If  we  assume  the  density  is  normal,  the 
maximum  likelihood  estimator  for  the  median  9  la  if  .  If,  In  fact,  the  underlying  dis* 
tributlon  Is  Cauchy,  S  Is  no  better  an  estimator  for  B  than  any  single  one  of  tho 
observations.  In  general.  If  we  assume  an  Incorrect  functional  form  of  the  density  and  use 
any  of  the  classical  parametric  techniques  for  estimating  Che  density,  we  will  find  that 


llmjE/fCx)  -f(x)ydx>0.  (4) 

n-«»  -c»  \  esc,n  true/ 


The  pathology  of  parametric  maximum  likelihood  estimation  under  real  world  conditions 
should  not  be  unexpected.  An  optimisation- theoretic  technique  designed  to  have  good  per¬ 
formance  under  very  restrictive  conditions  (e.g.,  that  the  functional  fora  of  the  density 
Is  known)  Is  unlikely  to  perform  well  when  we  step  outside  the  domain  of  these  condjltions. 
We  need  Co  devise  algorithms  which  are  "optimal"  In  a  more  general  and  realistic  setting. 
This  point  was  implicitly  raised  a  quarter  century  before  maximum  likelihood  by  Karl 
Pearson  [7].  (For  a  discussion  of  the  Flsher-Pearson  battle  on  maximum  likelihood,  the 
reader  Is  referred  to  (13].)  He  considered  a  fairly  large  class  of  probability  densities 
characterised  by  the  differential  equation 


d  log  f(x) 
dx 


X  -  a _ 

b  +b,x  +b,x* 
O  i  z 


(5) 


The  estimation  of  the  four  parameters  Is  readily  carried  out  via  the  first  four  sample 
moments.  Unfortunately,  although  Che  Pearson  Family  contains  many  of  Che  classical 
distributions,  It  has  serious  deficiencies.  For  example.  It  contains  no  multimodal  densities. 

In  order  to  obtain  a  practical  extension  of  Pearson’s  concept  to  density  estimation  In 
Che  general  setting  where  we  know  only  Chat  the  underlying  density  Is  "smooch",  we  aaist  de¬ 
velop  an  estimator  where  the  number  of  characterizing  parameters  increases  with  the  sample 
size.  The  simple  histogram  (dating  back  to  John  CraunC  In  1662  [3])  has  such  a  property 
but  suffers  from  discontinuities.  These  may  be  eliminated  quite  readily  by  connecting  mid¬ 
points  with  straight  lines.  The  extreme  "locality"  of  the  histogram  s  less  easily 
ameliorated. 

Computationally  more  complicated  but  possessing  better  consistency  properties  Chan. the 
histogram  la  Che  kernel  density  estimator  (or  "shifted  histogram"  [12],  [6],  [8]).  Here,  on 
Che  basis  of  a  random  sample  (xj^,X2,...,x  }  we  have  Che  estimator 


j-l 

where  K  Is  any  probability  density  having 

m 

XlK<y)|dy<- 

•CB 

sup  (K(y)|  <  » 

-“<  y  <•» 

llm|yK(y)j  -  0  . 

jr-o. 

To  mlnimlee  the  asymptotic  Integrated  mean  square  error,  we  have  the  optimal 


(6) 


(7) 

(8) 

(9) 
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vhich  gives  as  asymptotic  Integrated  mean  square  error 

IMSE  -  ||j‘{£”(x))*di^^^*" 


€11) 


A 

Unfortunately,  the  design  parameter  h  requires  approximate  knowledge  of  t{t'*M)ax  . 

An  Iterative  algorithm  for  the  estimation  of  h  Is  given  In  [12].  Monte  Qtrlo  results 
Indicate  that  a  twofold  overestlmatlon  or  underestimation  of  h  typically  causes  a  two* 
fold  increase  of  the  IMSE  over  that  shown  In  (U).  A  survey  of  other  nonparametric 
density  estimation  techniques  Is  given  In  [13]. 

A  new  approach  motivated  by  a  suggestion^ of  Good  [2]  has  been  considered  In  [4],  [5], 
[11],  [13].  Here  we  seek  that  density  f€H*(a,b)  which  maximises  the  crlterioa  functional 


t<f) 


j-i 


■Z'. 


dx, 


k-0 
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f^‘‘^€  I-^(a,b);  k  -  0,1, ...,s 

f^‘‘\a)  •»  f^‘‘^(b)  *=0;  k  -  0,1,2,. ..,s-l 
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£  >  0 

J’'*f(x)dx  ■  1  . 


The  solution  to  (12)  Is  referred  to  as  the  maximum  penalised  likelihood  estimator.  From  [5] 
ve  have 

Theorem.  The  KPLE  estimator  exists  and  Is  unique.  .  ■ 


Recently,  a  dlscretlsed  approximation  to  the  solution  of  (12)  has  been  algoritholtlsed 
and  Investigated  by  Scott  [10],  [11].  This  work  suggests 

Theorem.  If  f^(*)  1*  the  solution  to  the  MPLE  criterion  and  f^SH^Ca.b)  then 


a 


E[{f^(x)  -  fj{x))*Jdx-^0 


where  f^(>)  is  the  density  f  truncated  to  (a,b). 


(13) 


From  a  practical  standpoint,  the  performance  of  ^  (•)  Is  relatively  insensitive  to  the 
selection  of  the  design  parameters  a  ,  If  we  set  all  the  or.  ■  0  except  for  cr^*  It  is 
not  unusual  for  a  change  of  or,  hy  a  factor  of  100  from  the  optimal  to  increase  the  IMSE  by 
less  than  a  factor  of  2  . 


In  Table  1,  we  compare  the  IMSE  of  the  MPLE  with  that  of  popular  Gaussian  kernel  estimator 
for  various  densities  and  sample  sizes.  Of  special  note  Is  the  fact  that  although  we  have 
used  the  optimal  (and  unobtainable)  design  parameter  for  the  kernel  estimator,  ve  have  used 
the  suboptlmal  value  of  O2  ■■  10  throughout  for  the  MPLE  estimator. 
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TABLE  1 


IMSE  Values  of  the  NPLE  (02  **  10)  end  Gaussian  Kernel  Density  Estimation 
(with  optimal  h)  for  Various  Distributions  and  Sample  Sites. 


Density 

n 

MPLE 

IHSE 

Kernel 

IMSE 

H(0,1) 

25 

.0027 

.0041 

100 

.00079 

.00129 

400 

.00033 

.00053 

iN(-1.5.1) 

25 

.00159 

.00128 

■»iN(  1.5,  1) 

100 

.00054 

.00052 

25 

.00282 

.00475 

100 

.00084 

.00157 

3.  CCSiCLUSIONS 


The  supposed  optimality  of  classical  parametric  density  estimation  procedures  is 
frequently  invalid  because  the  true  functional  form  of  the  density  is  unknown.  Kever- 
theless,  we  can  attack  the  more  general  and  practical  problem  of  estimating  a  density 
of  ui^nown  functional  form.  The  maximum  penalized  likelihood  density  estimator  has  been 
algo/ithmltized  and  is  now  a  part  of  standard  statistical  software  (llj.  , 
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